A Probabilistic Model for Knowledge Component Naming

نویسندگان

  • Cyril Goutte
  • Serge Léger
  • Guillaume Durand
چکیده

Recent years have seen significant advances in automatic identification of the Q-matrix necessary for cognitive diagnostic assessment. As data-driven approaches are introduced to identify latent knowledge components (KC) based on observed student performance, it becomes crucial to describe and interpret these latent KCs. We address the problem of naming knowledge components using keyword automatically extracted from item text. Our approach identifies the most discriminative keywords based on a simple probabilistic model. We show this is effective on a dataset from the PSLC datashop, outperforming baselines and retrieving unknown skill labels in nearly 50% of cases. 1. OVERVIEW The Q-matrix, introduced by Tatsuoka [9], associates test items with attributes of students that the test intends to assess. A number of data-driven approaches were introduced to automatically identify the Q-matrix by mapping items to latent knowledge components (KCs), based on observed student performance [1, 6], using, e.g. matrix factorization [2, 8], clustering [5] or sparse factor analysis [4]. A crucial issue with automatic methods is that latent skills may be hard to describe and interpret. Manually-designed Q-matrices may also be insufficiently described. A data-generated description is useful in both cases. We propose to extract keywords relevant to each KC from the textual content corresponding to each item. We build a simple probabilistic model, with which we score keywords. This proves surprisingly effective on a small dataset obtained from the PSLC datashop. 2. MODEL We focus on extracting keywords from the textual content of each item (question, hints, feedback, Fig. 1). We denote by di the textual content (e.g. body text) of item i, and assume a Q-matrix mapping items to K skills ck, k = 1 . . .K. Figure 1: Example item body, feedback and hints. These may be latent skills obtained automatically or from a manually designed Q-matrix. For eack KC we build a unigram language model estimating the relative frequency of words in each KC [7]:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

A Probabilistic Model of Learning Fields in Islamic Economics and Finance

In this paper an epistemological model of learning fields of probabilistic events is formalized. It is used to explain resource allocation governed by pervasive complementarities as the sign of unity of knowledge. Such an episteme is induced epistemologically into interacting, integrating and evolutionary variables representing the problem at hand. The end result is the formalization of a p...

متن کامل

مدل ترکیبی تحلیل مؤلفه اصلی احتمالاتی بانظارت در چارچوب کاهش بعد بدون اتلاف برای شناسایی چهره

In this paper, we first proposed the supervised version of probabilistic principal component analysis mixture model. Then, we consider a learning predictive model with projection penalties, as an approach for dimensionality reduction without loss of information for face recognition. In the proposed method, first a local linear underlying manifold of data samples is obtained using the supervised...

متن کامل

Naming, Definition, and Etiology of Rabies in the Medical Texts of the Islamic Civilization

Background and purpose: Rabies is an ancient lethal disease in human civilization that endangers the life of many of its victims in case of ineffective treatments. Although the definition, etiology, semiology, and treatment of rabies exist in Islamic medical texts, but they are unintentionally or sometimes intentionally being neglected. This article aimed at studying the knowledge of the physic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015